Improving Statistical MT through Morphological Analysis

نویسندگان

  • Sharon Goldwater
  • David McClosky
چکیده

In statistical machine translation, estimating word-to-word alignment probabilities for the translation model can be difficult due to the problem of sparse data: most words in a given corpus occur at most a handful of times. With a highly inflected language such as Czech, this problem can be particularly severe. In addition, much of the morphological variation seen in Czech words is not reflected in either the morphology or syntax of a language like English. In this work, we show that using morphological analysis to modify the Czech input can improve a Czech-English machine translation system. We investigate several different methods of incorporating morphological information, and show that a system that combines these methods yields the best results. Our final system achieves a BLEU score of .333, as compared to .270 for the baseline word-to-word system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficacy of Massage Therapy on Pain and Dysfunction in Patients with Neck Pain: A Systematic Review and Meta-Analysis

Objective. To systematically evaluate the evidence of whether massage therapy (MT) is effective for neck pain. Methods. Randomized controlled trials (RCTs) were identified through searches of 5 English and Chinese databases (to December 2012). The search terms included neck pain, neck disorders, cervical vertebrae, massage, manual therapy, Tuina, and random. In addition, we performed hand searc...

متن کامل

Statistical MT Systems Revisited: How much Hybridity do they have?

The statistical approach to MT started about twenty-five years ago and has now been widely accepted as an alternative to the classical approach with manually designed rules. Among the attractive properties of the statistical approach is its capability to learn the translation models automatically from a (sufficiently) large amount of sourcetarget sentence pairs. Thus the need for the manual des...

متن کامل

MTriage: Web-enabled Software for the Creation, Machine Translation, and Annotation of Smart Documents

Progress in the Machine Translation (MT) research community, particularly for statistical approaches, is intensely data-driven. Acquiring source language documents for testing, creating training datasets for customized MT lexicons, and building parallel corpora for MT evaluation require translators and non-native speaking analysts to handle large document collections. These collections are furt...

متن کامل

Unsupervised Morphology Rivals Supervised Morphology for Arabic MT

If unsupervised morphological analyzers could approach the effectiveness of supervised ones, they would be a very attractive choice for improving MT performance on low-resource inflected languages. In this paper, we compare performance gains for state-of-the-art supervised vs. unsupervised morphological analyzers, using a state-of-theart Arabic-to-English MT system. We apply maximum marginal de...

متن کامل

Improving MT Quality: Towards a Hybrid MT Architecture in the linguatec 'Personal Translator'

This paper reports on measures to improve the quality of MT systems, by using a hybrid system architecture which adds corpus-based and statistical components to an existing rulebased system backbone. The focus is on improving the accuracy of the dictionary resources.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005